fastText
  • Docs
  • Resources
  • Blog
  • GitHub

›Resources

Resources

  • English word vectors
  • Word vectors for 157 languages
  • Wiki word vectors
  • Aligned word vectors
  • Supervised models
  • Language identification
  • Datasets

Aligned word vectors

We are publishing aligned word vectors for 44 languages based on the pre-trained vectors computed on Wikipedia using fastText. The alignments are performed with the RCSLS method described in Joulin et al (2018).

Vectors

The aligned vectors can be downloaded from:

Afrikaans: textArabic: textBulgarian: textBengali: text
Bosnian: textCatalan: textCzech: textDanish: text
German: textGreek: textEnglish: textSpanish: text
Estonian: textPersian: textFinnish: textFrench: text
Hebrew: textHindi: textCroatian: textHungarian: text
Indonesian: textItalian: textKorean: textLithuanian: text
Latvian: textMacedonian: textMalay: textDutch: text
Norwegian: textPolish: textPortuguese: textRomanian: text
Russian: textSlovak: textSlovenian: textAlbanian: text
Swedish: textTamil: textThai: textTagalog: text
Turkish: textUkrainian: textVietnamese: textChinese: text

Format

The word vectors come in the default text format of fastText. The first line gives the number of vectors and their dimension. The other lines contain a word followed by its vector. Each value is space separated.

License

The word vectors are distributed under the Creative Commons Attribution-Share-Alike License 3.0.

References

If you use these word vectors, please cite the following papers:

[1] A. Joulin, P. Bojanowski, T. Mikolov, H. Jegou, E. Grave, Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

@InProceedings{joulin2018loss,
  title={Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion},
  author={Joulin, Armand and Bojanowski, Piotr and Mikolov, Tomas and J\'egou, Herv\'e and Grave, Edouard},
  year={2018},
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
}

[2] P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

@article{bojanowski2017enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={Transactions of the Association for Computational Linguistics},
  volume={5},
  year={2017},
  issn={2307-387X},
  pages={135--146}
}
← Wiki word vectorsSupervised models →
fastText
Support
Getting StartedTutorialsFAQsAPI
Community
Facebook GroupStack OverflowGoogle Group
More
BlogGitHubStar
Facebook Open Source
Copyright © 2022 Facebook Inc.